AITopics | successor uncertainty

Specifically,becauseaQfunctionis defined with respect toaparticular policy,constructingPˆQ requires selection ofareference policy or distribution over policies.

etal, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsDec-25-2025, 02:10:16 GMT

Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those.

exploration and uncertainty, successor uncertainty, temporal difference learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.82)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

Neural Information Processing SystemsOct-2-2025, 06:36:33 GMT

Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsJan-22-2025, 01:18:28 GMT

This paper proposes using Bayesian linear regression to get a posterior over successor features as a way of representing uncertainty, from which they sample for exploration. I found the characterization of Randomised Policy Iteration to be strange, as it only seems to apply to UBE but not bootstrapped DQN, With bootstrapped DQN, each model in the ensemble is a value function pertaining to a different policy, thus there is no single reference policy. The ensemble is trying to represent a distribution of optimal value functions, rather than value functions for a single reference policy. Proposition 1: In the case of neural networks, and function approximation in general, it is very unlikely that we will get a factored distribution, so this claim does not seem applicable in general. In fact, in general there should be very high correlation between the q-values between nearby states. Is this claim a direct response to UBE? Also the analysis fixes the policy to consider the distribution of value functions, but this seems to not be how posterior sampling is normally considered, but rather only the way UBE considers it.

posterior, temporal difference learning, value function, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsJan-22-2025, 01:18:18 GMT

From the discussion, the reviewers appreciated the precisions made in the rebuttal. They have indicated what they would like to see improved in a revised version, in particular a clearer presentation.

exploration and uncertainty, successor uncertainty, temporal difference learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsOct-9-2024, 14:53:35 GMT

Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks.

exploration and uncertainty, successor uncertainty, temporal difference learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Janz, David, Hron, Jiri, Mazur, Przemysław, Hofmann, Katja, Hernández-Lobato, José Miguel, Tschiatschek, Sebastian

Neural Information Processing SystemsMar-18-2020, 22:16:57 GMT

Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks.

exploration and uncertainty, successor uncertainty, temporal difference learning, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Successor Uncertainties: exploration and uncertainty in temporal difference learning

Janz, David, Hron, Jiri, Hernández-Lobato, José Miguel, Hofmann, Katja, Tschiatschek, Sebastian

arXiv.org Machine LearningOct-15-2018

We consider the problem of balancing exploration and exploitation in sequential decision making problems. To explore efficiently, it is vital to consider the uncertainty over all consequences of a decision, and not just those that follow immediately; the uncertainties involved need to be propagated according to the dynamics of the problem. To this end, we develop Successor Uncertainties, a probabilistic model for the state-action value function of a Markov Decision Process that propagates uncertainties in a coherent and scalable way. We relate our approach to other classical and contemporary methods for exploration and present an empirical analysis.

bayesian inference, exploration, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1810.0653

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Filters

Collaborating Authors

successor uncertainty

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Successor Uncertainties: exploration and uncertainty in temporal difference learning